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Abstract. A Bernstein-type exponential inequality for (generalized) canon- 
ical (7-statistics of order 2 is obtained and the Rosenthal and Hoff-mann- 
j0rgensen inequalities for sums of independent random variables are extended 
to (generalized) [/-statistics of any order whose kernels are either nonnegative 
or canonical. 

1. Introduction 

Exponential inequalities, such as Bernstein's and Prohorov's, and moment in- 
equalities, such as Rosenthal's and Hoffmann-j0rgensen's, are among the most ba- 
sic tools for the analysis of sums of independent random variables. Our object here 
consists in developing analogues of such inequalities for generalized t/-statistics, in 
particular, for [/-statistics and for multilinear forms in independent random vari- 
ables. 

Hoffmann- J0rgensen type moment inequalities for canonical (that is, completely 
degenerate) [/-statistics of any order m were first considered by Gine and Zinn 
(1992), and their version for [/-statistics with nonnegative kernels turned out to be 
useful for obtaining best possible necessary integrability conditions in limit theo- 
rems for [/-statistics. (By Khinchin's inequality it is irrelevant whether one consid- 
ers canonical or nonnegative kernels in moment inequalities, at least if multiplicative 
constants are not at issue). Klass and Nowicki (1997) also obtained moment in- 
equalities for nonnegative generalized [/-statistics, but only for order m = 2, and 
their decomposition of the moments is more complete than that in Gine and Zinn 
(1992). Ibragimov and Sharakhmetov (1998, 1999) recently obtained analogues of 
Rosenthal's inequality for nonnegative and for canonical [/-statistics. The moment 
inequalities we present in the first part of this article, valid for canonical and for 
nonnegative generalized [/-statistics of any order m, when specialized to m = 2, 
represent the same level of moment decomposition as the Klass-Nowicki inequali- 
ties, coincide with theirs for powers p > 1 (except for constants) and are expressed 
in terms of different, simpler quantities for powers p < 1. Proposition 2.1 below, 
which constitutes the first step towards more elaborate bounds such as those in 
Theorem 2.3 below, has also been obtained, up to constants, by Ibragimov and 
Sharakhmetov. Our proofs consist of simple iterations of the classical moment 
inequalities for sums of independent random variables. 

The moment inequalities in the first part of this article do imply exponential 
bounds for canonical [/-statistics of any order and with bounded kernels which are 
sharper than those in Arcones and Gine (1993); however, they are not of the best 
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kind as they do not exhibit Gaussian behavior for part of the tail, which they should 
in view of the tail behavior of Gaussian chaos. 

In the second part of this article we improve the moment inequalities from the 
first part in the case of generalized canonical [/-statistics of order 2, and for moments 
of order p > 2 (Theorem 3.2). The bounds not only involve moments but also 
the Li operator norm of the matrix of kernels. Then we show how these improved 
moment inequalities imply what we believe is the correct analogue (up to constants) 
of Bernstein's exponential inequality for generalized canonical [/-statistics of order 
2 (Theorem 3.3). This exponential inequality, which docs exhibit Gaussian behavior 
for small values of t, is strong enough to imply the law of the iterated logarithm 
for canonical U -statistics under conditions which are also necessary. The main new 
ingredient in this part of the paper is Talagrand's (1996) exponential bound for 
empirical processes, which gives a Rosenthal-Pinelis type inequality for moments 
of empirical processes (Proposition 3.1) basic for the derivation of the moment 
inequality for [/-statistics of order 2. 

Because of the decoupling results of de la Pena and Montgomery-Smith (1995), 
we can work with decoupled [/-statistics, and this allows us to proceed by condi- 
tioning and iteration. 

2. Moment inequalities 
We consider estimation of moments of generalized decoupled U -statistics, defined 

as 

(2-1) E h n _ lm (x£\...,X^), 

l<ii.....i rn <n 

where the random variables Xf : 1 < i < n, 1 < j < m, m < n, are independent 
(not necessarily with the same distribution) and take values in a measurable space 
(S, S), and hi lt ...^ m are real valued measurable functions on S m . For short, this 
sum is denoted by hi. 

Given J C {1, . . . , to} ( J = is not excluded), and i = . . . , i m ) <E {1, . . . , n} m 
we set ij to be the point of {1, . . . , n}' J ' obtained from i by deleting the coordinates 
in the places not in J (e.g., if i = (3, 4, 2, 1) then i{i.3} = (3, 2)). Also, indicates 
sum over 1 < ij < n, j e J (for instance, if to = 4 and J = {1, 3}, then 

^""^ = ^11*2, «3, U — ' l H,i2,i3,u(-^i 1 \ • • • )-^i 4 ^)0 

ij i{l,3} l<ii,i 3 <n 

By convention, ^2- H a = a. 

Likewise, while E will denote expected value with respect to all the variables, 
Ej will denote expected value only with respect to the variables X^ with j e J 
and i G {1, . . . , n}. By convention, E^a = a. 

Rosenthal's inequality is easiest to extend to [/-statistics because it involves only 
moments of sums (as opposed to moments of maxima and quantilcs for Hoffmann- 
J0rgensen's inequality). So, we will first obtain analogues of Rosenthal's inequal- 
ity, and then we will transform these inequalities into analogues of Hoffmann- 
J0rgensen's by first showing that some moments of sums can be replaced by mo- 
ments of maxima, and then, that the lowest moment can in fact be replaced by a 
quantile. We will illustrate this three-steps procedure first in the case of nonnega- 
tive kernels and moments of order p > 1. Then we will see that this also solves, via 
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Khinchin's inequality, the case of canonical kernels and moments of order p > 2. 
Finally, we will consider the case of moments of order p < 1 for positive kernels 
and p < 2 for canonical, cases in which the inequalities are less neat, but still 
useful. We will pay some attention to the behavior of the constants as p — ► oo 
in these inequalities since such behavior translates into (exponential) integrability 
properties. 

2.1. Nonnegative kernels, moments of order p > 1. For nonncgativc indepen- 
dent random variables we have the following two improvements of Rosenthal's 
inequalities, valid for p > 1: 
1) Latala's, 1997: 



, P> 1, 



(see Pinelis (1994) for the corresponding inequality when the random variables are 
centered) ; 

2) Johnson, Schechtman and Zinn's, 1985: 

v 



E 



< K p 



P 



logp 



P > i, 



where X is a universal constant. See Utev (1985) and Figiel, Hitczcnko, Johnson, 
Schechtman and Zinn (1997) for more precise inequalities of the same type. 
And for general p > 0, we have the following improved Hoffmann- J0rgensen inequal- 
ity, that follows from Kwapieh and Woyczynski (1992) and which can be obtained 
as in the proof of Theorem 1.2.3 in de la Peha and Gine (1999): 
3) 



where 



< 2 p - 2 ■ 2 { p- 1)v0 ■ (p + l) p+1 fig + £max 



P>0, 



to := inf 



*> 0:Pr {||E&|| >*} ^ \ 



and where we write norm for absolute value in order to include not only inde- 
pendent nonnnegative real random variables, but also independent nonnegative 

random functions £j taking values in certain 'rearrangement invariant spaces' such 

vV(svi) 



as L s (fl, S, fi), < s < oo, with ||£|| := (/ |£| s d/z) 
Markov, 



, or ioo^Ls). Note that, by 



*o<2 i / r ^|2^ir) 



1/r 



so that, (H) becomes: 
4) for < r < p < oo, 



£iE&r <2 p " 2 - 2(p " i)v °-(p+ i ) 



(Hr) 



2" /r (4E6ir) 

+ ^max||^|| p 



p/r 



Inequalities (H) and (H r ) hold for spaces of functions which are quasinormcd mea- 
surable linear spaces whose quasinorm |j • || has the property that ||x|| < ||y|| when- 
ever < x < y. 
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In the following proposition we extend inequalities (R\) and (i? 2 ) by means of 
an easy induction. 

Proposition 2.1. Let m E N , p > 1, and, for all i E {1, . . . ,n} m , let hi be a 
nonnegative function of m variables whose p-th power is integrable for the law of 
X i = (4 1) ,...,X^ ) ). Then, 



max 

JC{l,...,m} 



(2.2) 



E^(E^) P <^(E< 

i.; ijc i 

<(2e 2 r p E [p |Jb E^(E^\f 



JC{l,...,m} ij ijc 

and a/so, i/iere exzsis a universal constant K < oo smc/i £/iai 



(2.2') 



p 



max 



logp J JC{l,...,m} 



E^(E^ 



Proof. The proof of (2.2') with sum over the subsets J instead of maximum differs 
from that of (2.2) only in the starting point ((R2) instead of (-Ri)); then, replacing 
sum by maximum simply increases the constant by a factor of 2 m . The left side 
inequality in (2.2) follows by Holder since p > 1. Consider the right hand side 
inequality. For m = 1 this is just inequality (i?i) and we can proceed by induction. 
Suppose the result holds for m — 1 . By applying the induction hypothesis to 

E CE hi Y = E ™ E {i,--,™-i}\ E (I> 



1 {l,...,m— 1} 



we only have to consider the generic term in the decomposition (2.2) for the new 
kernels (^J2i m hi) w i tn tnc X- m ^ variables fixed. In other words, letting J TO _i be 
any subset of {1, ... , m— 1} and J^ l _ 1 its complement with respect to {1, ... , m— 1}, 
we must estimate 



E m E ^(E *MEfc)V 

i 7 i jc im 

■ J m — 1 m — 1 

= E ^-l^fEfe-. E ^ 



Rosenthal's inequality (i?i) applied to the kernels Ej? _ J^. c /i; with the vari- 

m J m — 1 

ables in J m _i fixed, gives 



E(% 1 E / ") < (2e 2 ) p ( E E ™Ej' m _h) P 

i m ijc ' L i m .ijc 

J rn-1 ' J m-1 
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Upon integrating each term with respect to Ej m _ t and summing over ij m _ i; we 
then obtain 



E„ 



< (2e 2 ) p 

+p p E ^-.uwf^-. E 



E Vi( E ^™- 1 U{m} /i i) P 

7 m-l lj ^_i u { m > 



Multiplying by (2e 2 )( m 1 )Ppl J ™-il ; this is the sum of two terms of the form 

(2e) m V Jb Ei. 7 ^j(Ei je E ^ h ^ ( fOT J = J m-i and fOT J = Jm-Mm}), proving 
the proposition. □ 

This proposition solves the problem of estimating, up to constants, the moments 

of a decoupled [/-statistic by 'computable' expressions. For instance, if the functions 

(7) 

h{ are all equal and if the variables X\ are i.i.d., then the typical term at the right 
of (2.1) just becomes n\ J \ +p \ JC \ Ej(Ejch) p , a 'mixed moment' of h. For m = 2 the 
right hand side of inequality (2.2) is just: 

s(£M*, (1 Uj a) )Y < (2e 2 ) 2 -[(E^(^ 1) ^f ) )) P 

+^EME^M*i (1 Uj a) )) p 

i j 

+^E^(E^^(^ (1) ^f)) P 



(2-2") +P 2p Y, i , j Ehl j {x\ l \xf ) ) 



We have been careful with the dependence on p of the constants because it is of 
some interest to obtain constants of the best order as p — > 00. In fact, (2.2') exhibits 
constants of the best order as can be seen by taking the product of two independent 
copies of the example in Johnson, Schechtman and Zinn (1985), Proposition 2.9. 

Next we replace the external sums of expected values at the right side of the above 
inequalities by expectations of maxima without significantly altering the order of 
the multiplicative constants. If & are independent nonnegative random variables, 
then, 



(2.3) 

where 

(2.4) 



*o V E E&t&So < £max£f < 5 P + £ E$I 6i>So , 0<p<^, 



So = inf 



t > : E Pr U* > *} - 1 
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(Gine and Zinn (1983); see also de la Pena and Gine (1999), page 22). The left 
hand side of (2.3) gives that, for < r < p and & independent, 

(2.5) 5>|e*r < 2i?max|^| p + 2(^i?|e i r)(i?max|e i | p ) (P ^ )/P 

(e.g., de la Pena and Gine (1999), page 48). This inequality, applied with r = 1 < p, 
yields 



(2.6) 



P 



X W < 2(l+p Q )max[^max|6r, (^E^lf 



for all a > 0. There are similar inequalities for other values of r; r — 1 is adequate 
for £j > 0, but r = 2 is better for centered variables. If we use inequality (2.6) 
in (2.2"), iteratively for the last term, we obtain that, for a universal constant K 
(easy but cumbersome to compute), h^j > 0, p > 1, 



< K p (2e 2 ) p p 4 



E Ki ) + P PE i max (Yl E ^ hi >- 



1,3 



(2.7) 



■p p E 2 m&x( y^Eihu ) +p 2p E max h p 

j K^-* 1 ' ) i,j 



Inequality (2.7) was obtained, up to constants, by Klass and Nowicki (1997) (it is 
their inequality (4.14)). Our proof is different, and it is contained in the proof of 
the next corollary, which extends inequality (2.7) to any m. 

Corollary 2.2. Under the same hypotheses as in Proposition 2.1, there exist uni- 
versal constants K m such that 



max 

JC{l,...,m} 



Ej max Ejc h\ 



< E 



(2.8) 

and 
(2.8') 



./c (i <<** 



< K p 



logp 



nip 



max 

JC{l,...,m} 



Ej max I > Ejc hi 



Proof. The left side of (2.8) follows by Holder. Inequality (2.8') has a proof similar 
to that of the right hand side of (2.8), and therefore we only prove the latter. We 
will prove it by induction over m simultaneously with the inequality 

(2.9) p mp Y, E K <K p m P Wp Ejma^(^Ejeh^ 

i JC{1,... ,m} 1 ijc 

Let us first note that the inequalities fl2.9| ) for 1, . . . , m— 1 to gether with (2.2) imply 
(2.8). It is therefore enough to show that if (2.8) and (2.9) hold for 1, . . . , m — 1 
then (|2.9[ ) is satisfied for m. We will follow the notation of the proof of Proposition 
2.1. Inequality (2.9) for m—1 is just (2.6), and (2.8) for to = 1 is (Hi) (which also 
follows from (Ri) and (2.6)). By the induction assumptions we have 

(2.10) P mp Y^Eh P =p<>J2 E rnP {m - 1)P E E {l,-,m-l}h P 

< k p 
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x E 

J m _iC{l,... ,m-l} 



Now, by (2.6), for any J m -x C {1, • 



— 1} we have 



P (|Jm - l|+1) ^ m _ 1 ^ m ^max( ^ Bj^/n 



<2(l+p) 



(\j m ^ 1 \+i)p E 



.i m - 1 u{m} max 

■J m _lU{m} 



E e k-& 



(2.11) 



|J "- l|, 'Sj m _ 1 (5~;^m max £ E.r m _h^ P 



To estimate the last term we note that 
J 



P 



(2.12) 



*tti lie 

<pl^-ilf J E Jm _ 1 (^ J B J o i _ iU{m} / li ) P 



JCJm-l ' ' 



E, 



(J m -i\J)UJ^ n _ 1 U{m} 



, (^-l\flU^_ 1 Ufm) 



where in the last line we use the induction assumption (2.8) for | J m _i| < m. Finally 
( f2~T(i| ), ( |2.11| ) and ( |2.12| ) imply ^9|) and complete the proof. □ 

Remark. The proof of Proposition 2.6 below will use a version of Corollary 2.2 
for nonnegative random functions taking values in L r . The inequality is as follows: 
for p > 1 there exists K m ^ p ^ r < oo such that 

(2.8") 



Ell} hi\\ < K m „ r max Ej max) £j« 7 /ii 

MZ — ' " JC{l,...,m} ij V 11 ^— ' 

i ijc 

The proof is similar to the previous ones and is omited: one takes (H p ) as the 
starting point of the induction. 

Finally we come to the third step, which will extend Hoffmann-j0rgen-sen's 
inequality (H) for p > 1. If we want to use the inequalities from Corollary 2.2 
to obtain boundedness of moments from stochastic boundedness of a sequence of 
^/-statistics, we need to replace the term corresponding to J = by the p-th power 
of a quantile of Xa^i- For this we use Paley-Zygmund's inequality (e.g., Kahane 
(1968) or de la Pena and Cine (1999)): if A is a nonnegative random variable and 
< r < p < 00, then, for all < A < 1, 



(2.13) 



Pr 



{A> \\\A\\ r } 



> 



{1 _ X r ) \\Mr V ""-' f 



where ||A|| r = (E\A\ r ) 1/r for < r < 00. Consider for instance inequality (2.8). It 
has the form 

EA?<B + K m (EA)*>, p>l, 
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with A = J2i hi- Then, either B > K p 1 (EA)p, in which case we have EA P < 2B, 
or B < K^EAy, in which case we have EA P < 2K p l (EA) p and we can apply 
Paley-Zygmund's (2.13) with A = 1/2 and r = 1. It gives 

PrjA > \ea} - ' 

Hence, if we define 
(2.14) t = inf 



> 



2(p+i)/(p-i)ifP /(p - 1) ' 
t > : Pr{A >t}< 



1 



2(p+i)/(p-i)if£/( p - 1 ) 
we obtain EA < 2t - So, in either case, 

EA p < 2B + 2 1+p K p l t p . 

Also, by Markov's inequality, 

1 



2(p+i)/(p-i)ifP /(p - 1) ° 



tl < EA p . 



We then have: 



Theorem 2.3. Under the hypotheses of Proposition 2.1, there exist a universal 
constants K m < oo such that, if to is as defined by (2.14) f or A — J2i fo> then 



1 



tn V max 



(4i4T m )p/(p- 1 ) 



max 

JC{l,...,m} 
.7#0 



Ej max(y^ Ejchij 



< El 



(2.15) 



< {AK m ) p l 2 1+ X + ]T P lJlPE -' ™ X (E E -' chi 

^ J"C{l,...,ro} 
.7/0 



■J<= 



A similar inequality with different constants can be obtained from (2.8'). This 
is the most elaborate form we will give to our bounds for h > and p > 1 . 
The right hand side of (2.15) for m — 2 becomes, disregarding constants, 



e(^^^ hj t j) <Cmax E 1 m.ax(^^E 2 h i j^ , E 2 max|^ Eihi ^ , 



(2.15') 



So, we get the p-th moment of the double sum controlled by moments of partial 
maxima of conditional expectations plus a quantilc. The Gine-Zinn (1992) inequal- 
ity (for m = 2), 

hj,j) < Cmax ffmaxfy^ hjj') ,t p Q , p > 1, 

i,3 3 

is slightly weaker in appearance than (2.15') (actually, we only published the result 
for canonical [/-statistics, but we applied it as well to nonnegative variables, for 
which the proof is the same: see, e.g., Gine and Zhang (1996)). For applications of 
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this inequality in the asymptotic theory of [/-statistics see Gine and Zhang (1996), 
Gine, Kwapieh, Latala and Zinn (1999) and de la Peha and Gine (1999). 

Remark. The constants in the definition of to in (2.15) depend onp, hence, so does 
to- This is not the case when m — 1 (as a consequence of the improved Hoffmann- 
j0rgensen's inequality of Kwapieh and Woyczyhski -see, de la Peha and Gine (1999) 
p. 11-). But in most applications it does not matter whether the definition of the 
quantile depends on p. 

2.2. Canonical kernels, moments or order p > 2. If are centered and independent 
andp > 2, then, by convexity and the Khinchin-Bonami inequality (e.g., de la Peha 
and Gine, 1999, p. 113), we have 



2- p E 



(2.16) 



I X P I x IP 



P/2 



where £{ are independent identically distributed Rademacher random variables, 
independent from Suppose h\ is canonical for the variables {X^} given in 

the previous subsection, that is, suppose 



(2.17) E j h{X\ 1 i \...,Xl 



a.s. for all j = 1, . . . ,m, 1 < i\, 



< n. 



Let e^p be an independent Rademacher array independent of {Xf}, and set 



fcl - — b ii fc i m • 



Then, recursive application of inequality (2.16) gives 



/ \ I I 1 

2- mp E[Yhl\ < 2- mp E \J2 £ ^ ^ E \J2 hi \ 



(2.18) 



< 2 m PE^e i h i \ P < 2 mp (p-l) mp / 2 E(j2h?y /2 . 



This inequality reduces estimation of moments of canonical [/-statistics to estima- 
tion of moments of nonnegative ones (and conversely), at least if constants are not 
an issue. Combined with Proposition 2.1, it gives the analogue of Rosenthal's in- 
equality for centered variables and p > 2, and if we apply it in conjunction with 
Corollary 2.2, we obtain the following inequality: 

Proposition 2.4. //, for p > 2 and all i e {1, . . . , n} m , hi(X^ , 
p-integrable and Ejhi(X^\ . . . , x[ m ^) — a.s. for all j = 1, . . . , m, then 



max 

JC{l,..,m} 



Ejm'Ax^Ejahfy^ <E^J2hi 



(2.19) 



p (m+\J\)p/2 Ej ^2 Ejeh 2 
J ij" 



<K p m £ 

JC{l,...,m} 

for universal constant K m < oo. 

And, applying Paley-Zygmund with r — 2, we finally have: 



p/2 
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Theorem 2.5. Let hi be as in Proposition 2.4, and let p > 2. Then, there exist 
universal constants K m < oo such that, if t is defined as 

'3\p/(p- 2 ) 1 



to = inf 



t > : Pr|| ^/n| > ij < Q 



then 



(4K m p m / 2 )p/(p- 2 ) 
(2.20) <f?|5^Ai 



in V max 



2- mp max 

JC{l,...,m} 
.7/0 



(2^>™f/2) 1/(p - 2) 



P/2' 



< 2KP n \ (2p m / 2 )X + p( m +^P/ 2 Ej max(^ Ej.h? 



P/2' 



JC{l,...,m} 
.7/0 



If, instead of inequality (2.2), we wish to obtain an analogue of inequality (2.2'), 
that is, if we want to replace the constants at the right hand side of (2.19) by 
(Kpj logp) mp , then we cannot use Khinchin's inequality and must proceed directly 
with an induction as in Proposition 2.1 with the following change: we must consider 
the variables J2i c ^' as taking values in L 2 (^m_i) and apply inequality (1.5) in 

J m-1 

Kwapien and Szulga (1991), which gives Rosenthal's inequality with best constants 
for centered independent random variables in Banach spaces. We skip the details. 

2.3. Nonnegative kernels, moments of order p < 1. It seems impossible to obtain 
inequalities as simple as in the previous section for this case. However, one can still 
obtain inequalities that may become useful when combined with Paley-Zygmund. 
Here is an analogue of Corollary 2.2 for h > and p < 1. The method of proof is 
inefficient regarding constants as Hoffmann- J0rgensen is applied twice at each step. 
Hence, constants will not be specified. 

Proposition 2.6. Let 0<r<p<l,m<oo and assume that the kernels hi > 
have integrable p-th powers. Then 



max 

JC{l,...,m} 



EjTUK.(Ejcfeh^ T y h <E(J2^ 



(2.21) <K rvm max 

JC{l,...,m} 



Ejm^Ej.(j2hi) ) 



r\ p/r 



where K riPim depends only on the parameters r, p, m. 

Note that all the terms in this bound represent a reduction in the number of 
sums except for the term corresponding to J = 0, which consists of a power of the 
r-th moment of a [/-statistic of order to. We will deal later with this term by means 
of the Paley-Zygmund argument. 

Proof. The inequality at the left side of (2.21) follows from Holder. Inequality 
(H r ) is just the right hand side of inequality (2.21) for m = 1 and we can proceed 
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by induction. We still use the notation from Proposition 2.1. By the induction 
hypothesis we have 



(2.22) E(y2hY = E m E {h ._ m _ 1} ( ]T 5>i 



!{!,... ,m-l} 



^ E Jm _ ± E m max ( £ 

- n ii lj m-l L : 



r~\ P/ r 



./ m -iC{l,... ,m-i} 

Let us fix J m _i C {1, ... , m — 1} and note that, for fixed (X^)i e j m ll we have 

^(EE^- II 



max 



1 7C Z„ 



for suitably chosen independent r.v.'s ft,j m in l°°(L r ). Therefore by (H r ), which still 
holds in this space (as the norm, restricted to nonnegative vectors, is monotone 
increasing), we have 



Ej m _ 1 E m max 



p/r 



r\ p/r 



r\ p/r 



J m-1 

< Cp^Ej^ E m max \\h im \\ p/r + (# ro || £ h ' ' ' " ' 

im 

= C P: r E Jm _ lU i m \ max (Ejo ( V" h^) ) 

'.; m _ 1 u{m) V V. — / / 

(2.23) + E Jm _, (E m max Ej. (j^ J2 hi 

Now, to estimate the last term, we note that 

( v rl P/ r 

max ( ^ 

< E Jm _, 



p/r 



E, 



(2.24) 

< Kp/r,l,\J m -i\ 



%-iU{™}(E ^) 



p/r 



x m 

./CJ,„-i 



max 



J,„_i\JUJ= 



-iU{m}( E ^ 



1 J m _l\JUJ^_ 1 U{m) 



p/r 



which f ollow s by t he v ersion of Corollary 2.2 for L r ((2.8") for p/r > 1). Now 
( [2.22] ), ( 2.23 ) and ( 2.24 ) complete the induction step. □ 

To deal with the term corresponding to J = in Proposition 2.6 we apply 
Paley-Zygmund as above, but now with r < p replacing 1 < p. The conclusion is: 
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Theorem 2.7. There is a constant K riPiJn such that for < r < p < 1, m < oo, 
and h{>0 with integrable p-th powers, we have 

\j ma,x( Ejc hj) ) 

JC{l,...,m} " ijc 



1 fPy V 

IK _U/(p-r) t V 



(2P+ 1 ^ r , p , m ) 1 /(p-'') 



(2.25) <^(^]/ii) P 



where 



to = inf 



JC{l,...,m) 
J#0 



r\ p/r 



t : Pr{j>i > t} < i(2f+ 1 7Y w „)- 1 /(P- 



Hence, the p-th moment of a {/-statistic of order m can be estimated by partial 
moments of maxima (or sums) of conditional moments of [/-statistics of lower order 
plus the p-power of a quantile of the original [/-statistic. 

2.4- Canonical kernels and moments of order 1 < p < 2, or kernels h separately 
symmetric in each of the coordinates and < p < 1 . The canonical case reduces 
to the positive case by means of inequality (2.18), as before. The convexity part of 
inequality (2.18) fails for p < 1, but in this case, if h is symmetric separately in each 
of the coordinates, we can still randomize by products of independent Radcmacher 
variables and recursive application of Khinchin's inequality still reduces this case 
to nonnegative h. We leave the resulting statements to the reader in order to avoid 
repetition. 

2.5. Regular (undecoupled) general U-statistics. If /li(x) = /ii os (xo s) for any per- 
mutation s of {!,..., to} and hi = if i has repeated indices, and if the sequences 



1. 



i} are independent copies of each other, then the decoupling 



inequalities of de la Peha and Montgomery-Smith (1995), together with the decou- 
pling inequality for maxima in Hitczenko (1988) in combination with the previous 
inequalities give moment inequalities for the generalized [/-statistics 

hi 1 ,...s m (Xi 1 , . . . ,Xi m ) 



where {Xi} is a sequence of independent random variables, at the cost of vastly 
increasing the numerical constants (see e.g. Gine and Zinn (1992) for a similar 
application of the decoupling inequalities). We omit the resulting statements. 

2.6. Comparison with previous results. We have already noted, below the statement 
of Theorem 2.3, that the inequalities there are better than the Hoffmann-j0rgensen 
type inequalities for [/-statistics in Gine and Zinn (1992) in that they represent 
a decomposition into simpler quantities. Also, as mentioned in the Introduction, 
Ibrahimov and Sharakhmetov (1998, 1999) obtained, except for constants, Proposi- 
tion 2.1 and its analogue for canonical kernels for to = 2 and announced the result 
for general to; the final results in the present article for p > 1 in the nonnegative 
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case (Theorem 2.3) and for p > 2 in the canonical case (Theorem 2.5), replacing 
some sums by maxima and lower moments by quantiles, seem to be more useful. As 
mentioned above, Corollary 2.2 restricted to m — 2 recovers inequalities (4.14) in 
Klass and Nowicki (1997). The inequalities in the last mentioned article for nonneg- 
ative kernels, p < 1 and m — 2 (the nonconvex case, inequalities (4.13) there) are 
different from our inequalities in Theorem 2.7 for m = 2, although they represent 
a similar level of decomposition of the p-th moment of the {/-statistic. Basically, 
the difference is that they use inverses of truncated conditional moments whereas 
we use inverses of tail probabilites together with partial moments. This can be 
better seen by comparing Hoffmann- J0rgensen, which is Theorem 2.7 for m = 1, 
with their inequality for m = 1. The result of Klass and Nowicki (1997) can be 
described as the iteration of an inequality that follows from Hoffmann- J0rgensen, 
Paley-Zygmund ((2.13)) and (2.3), as follows. Given £j, i = 1, . . . , n, nonnegative, 
define v n as 

(2.26) w = supjv > : E^(f A X ) - 1 } 
or, what is the same, vo is the largest number satisfying 

(2.27) v q = J2e(^Av ). 
Then, the inequality in question is: 

Corollary 2.8. (Klass and Nowicki, 1997, Cor. 2.7) Let 1 = l,...,n, be 

independent nonnegative random variables. Then, for all p > 0, 

(2.28) E {J2^Y -Em^! + v%. 
Proof. Since 

it follows that 5q < v - Therefore, if p < 1, inequality (2.3) and the definition of v 
give 

E (E&) P - (E^ Auo )) P + E^^>"o <v p +2Emax$. 

j 

And if p > 1, Hoffmann- J0rgensen ((H)) and the previous inequality (with p = 1) 
give 

e (E & ) P z ( E E & ) P + E max $ z < + E max 

For the reverse inequality, if p > 1, 

^ = (E^ Aw o)) P <^(E^) P - 

And if p < 1, following the proof of Lemma 2.2 in Klass and Nowicki (1997), we 
first observe that Paley-Zygmund and the first part of this proof give that for some 
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universal constant C, 



Pr^ 



(£EfeA«o))' 
^(E&Atto))' 



> 



C 



b(E(6ah,))' 
c 



4 £max(^ Ad ) 2 



> 



8' 



therefore, 



> E 



/ E(«.Ai>o)>«o/2 



> 



8 2p' 



In fact, if we bound to by t$ < 2£ l |^(£j A to) J an d a PPb/ the above proof 
to the variables £j A to, Hoffmann- J0rgensen gives the following seemingly weaker 
inequality: letting v be the parameter vo for the smaller variables £j A to (note 
«o < w o) ; then 



(2.22') 



max £f + 5^ 



3. Improved moment inequalities and exponential inequalities for 

m = 2 



P>2, 



The right hand side of inequality (2.19) for m = 1 is just 

(3.1) S |E&f " ^ Pmax [^ max C P 'P P/2 (E^ 2 ) P/2 
where £j are independent mean zero random variables. These inequalities were 
first obtained by Pinelis (1994). Part of their interest lie on the fact that they are 
basically equivalent to Bernstein's inequality up to constants. Here is how (3.1) (for 
all p > 2) implies Bernstein's inequality up to constants. Assume ||£i||oc < A < 00 
for all i, and set C 2 = J2 Then, (3.1) has the form 



e \ E&r - KPmax [p pAp > p p/2 c p ], p>2. 



Let 



P = 



A 



f— V 

\KeCJ 



KeA \KeC> 

for any x for which p > 2. Then, by Markov's inequality, (3.1) gives, for these 
values of t, 



Pr {|E&| >x } ^ < 



KP fp AP < e- p if p p AP > P p/ 2 Cp 
E^CL <R -v otherwise. 



e 2 e p — e 2 exp 



x fx 
A 1 



Hence 

(3 - 2) "ll^'l ' ~J ~ r l KeA ' \KeC> 

for all x > 0. Similarly, from the iteration (2.19) of the inequalities (3.1) we can 
obtain exponential inequalities for generalized decoupled [/-statistics of any order. 



Pr {|E&| >*} < 



{■ 
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However, the inequalities we obtain, while better than the existing ones, are not of 
the best kind, as we will see below. We illustrate this comment by considering the 
case m = 2. In this case, inequality (2.19) is as follows: 

(3.3) p 3p/2 E 2 max(^ E^ 2 ^ ,p 2p E max \h h 

3 ^ >3 

For bounded canonical kernels hij wc define 

(3.4) A = max\\h tJ \\ oo , C 2 = Y J Eh\ j , 

i,3 



V 

3\ 



B = max 



Then, we can proceed as in the deduction of (3.2) from (3.1), and easily obtain 
from (3.3) that there is a universal constant K such that 



(3.5) 



Pr{|j>,| > *} < ^expj-lming, \ Q^] } 



This inequality also holds for regular canonical [/-statistics by the decoupling in- 
equalities of de la Peha and Montgomery-Smith (1995). 

Inequality (3.5) is better than the Bernstein type inequality in Arcones and 
Gine (1993) as it is better for x < n 2 A and the probability is zero for x > n 2 A. 
Inequality (3.5) is suboptimal for small values of x, for which the exponent should 
be a constant times — x 2 , just as for chaos variables of order 2 (see Ledoux and 
Talagrand (1991) and Latala (1999)). This suggest that inequality (2.9) is not of 
the best kind, and can be improved. 

Next we improve the Rosenthal type inequality (2.9) for m = 2 (that is, (3.3)) 
and deduce from it an exponential inequality for canonical [/-statistics of order two 
which docs detect the Gaussian portion of the tail probability. 

First we show how Talagrand's (1996) extension of Prohorov's inequality to 
empirical processes, actually in Massart's (1999) version, produces an improved 
Rosenthal's inequality for empirical processes. Then, we will use this inequality 
to estimate the terms resulting from conditionally applying inequality (3.1) to the 
U- statistic. 

To describe Massart's version of Talagrand's inequality we must establish the 
setting and define some parameters. Let Zi be independent random variables with 
values in some measurable space (T, T) , let J 7 be a countable class of measurable 
real functions on T, and define 



S := sup Y,f(Zi), o 2 = su P V£(/(Z t )) 2 , a := max sup \\f(Zi)\ 



Then, 



(3.6) Prj|S| > 2E\S\ + aV8^ + 34.5aa;| < e~ x 
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for all x > 0. It follows easily from inequality (3.6) that 

(3.7) E\S\ P < K P \(E\S\) P +p p/2 a p +p p a p 

for some universal constant K < oo and all p > 1, in fact, inequality (3.7) for all p 
large enough and inequality (3.6) for all x > are equivalent up to constants. (Wc 
do not plan to keep track of constants in the derivation below and, therefore, we 
refrain from specifying a value for K in (3.7).) 

Proposition 3.1. Let {Zi} be as above, let T be a countable class of functions 
such that Ef 2 {Zi) < oo and Ef(Zi) = for all i. Then, in the notation from the 
previous paragraph, 

(3.8) E\S\ P < K p \{E\S\) P + p p/2 a p + p p E max sup I f(Zi) \ P 
for all p > 1, where K is a universal constant. 

Proof. Set F := sup /e ^ |/| and M p :— 8 • S^max; \F(Zi)\ p . Since the variables 
f(Zi) are centered, we can randomize by independent Rademacher variables £» 
independent of the Z variables (at the price of increasing the value of the constant 
K). Set S := su P/ | £ffi/(^)|- Then, 

\S\ < sup 1 1 S^£if(Z i )I F < z . ) < M \ +sup \ y^eif(Zi)I F{Zi}>M \ := Si + S 2 , 
f f 

and notice that, since ES P < 2 P+1 _E|S'| P (e.g., Lemmas 1.2.6 and 1.4.3 in de la Pena 
and Gine, 1999), inequality (3.7) gives 

ES P < K p \(E\S\) p + p p/2 cj p + p p M p . 

To estimate ES% we apply the original Hoffmann- J0rgensen inequality (from e.g., 
Ledoux and Talagrand (1991), (6.9) in page 156) to get 

ES P <2-3 p (t p + Em&xF(Z l ) p ), 

where t is any number such that Pr{S 2 > t } < (8 • 3 P ) _1 . But the choice of M 
implies that we can take to — because 

Pr{S 2 > 0} = Pr{maxF(Z 4 ) > M) < 

proving the proposition. □ 

In what follows we will assume, just as above, that the kernels hij, i,j < n, are 
completely degenerate and define 

(3.9) ^=11(^)11^ := S upi [ Ej2hiAX^\xf)f i (X^)9 j (xf) 

■.Ej2f 2 (X^)<l,Ej29](X^)<l}. 



Theorem 3.2. There exists a universal constant K < oo such that, if hij are 
bounded canonical kernels of two variables for the independent random variables 
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x\ 1] ,xf\ i.j = 1, . . . , n, n e N, tfien 



l<i,j<n 



(3.10) 



,3p/2 



Si max (E ^ 2 ftfj ) P/ +E 2 max (E £1 h 



p/2 



+p 2p Em^\h tJ \ p 



for all p > 2. 



Inequality (3.10) is strictly better than the right hand side inequality in (2.9) for 
m = 2, that is, than (3.3). 

Proof. Inequality (3.1) applied conditionally on the variables gives 

(3.11) s|E^f ^^(WeM^MT +P p E 2 j2\J2 h ^\ P )- 

i,j 3 i 3 i 

To bound the first summand at the right hand side of (3.11) we first notice that 



1/2 



i 3 3 

where in fact, the sup is taken only over a countable subset of mean zero vector 
functions (/1, . . . , /„) dense in the unit ball of L 2 {C{X {2) )) x • • • x L 2 {C{X {2) )) for 

the seminorm |(/j)j<n| = (j2 Efj(Xj 2 ^fj ■ [T° see this, first apply duality in £ 2 

(2) 

and then in L 2 (£(Xj ')) for each j.] So we can apply (3.8) to Zi = (/ij,j)" = i with 

f(Zi) = E 2 J2 3 hi,j{Xf\xy')fj{Xy'). In this case, the right hand side terms in 
(3.8) can be estimated as follows. The first term: 



{E\S\f <E\S\ 2 = E 



EME^yi^EC-^ 2 - 



3 1 »>J 

For the second we see that, since, by the previous duality argument, 

. 2 



E^^£M*fUfV;(*j 2) )) < 11 



\,j ) II L 2 ^L 2 



D z 



it follows that a < D. The third term: 
E 



maxsup \f(Z t )\P = ^ max sup \e 2 £ /^(xf \ xf Vi^f } ) 
1 / 1 E£/ki L Y 



1/2 

<S im ax sup^J^E^L) ( S £/ 2 ) 



1/2- 



1 EE// 

^max^E^L) 



p/2 
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,P/2 



Thus, inequality (3.8) gives 

(3.i2) ^Qt^qt/^) 2 ) 

< [pP/2(7P + p P£)P + p 3p/2 Si max ^ 2 ^ ft 2^ 



P/2"| 



To estimate the second summand at the right hand side of (3.11), we apply (3.1) 
once more and obtain 



(3.13) 



lfE 2 ^E 1 \j2h id \ P 

i 

p 3p/2 ^E(E^^) P/2 +^EI^I 



3 

< K p 



1,3 



Thus, to complete the proof of the theorem it suffices to replace the sum in j and 
the sum in i, j respectively by maxima in j and in i, j on the terms at the right 
hand side of this inequality. But this is an easy exercise of application of inequality 
(2.6). For completeness sake, here it is. Applying (2.6) with a = 3 and p/2 instead 
of p, the first term at the right of (3.13) bounds as: 



p 3p/2 ^£(E^,) 



P/2 



<2 1+3p / 2 (l + (p/2) 3 ) 



which produces the conversion of the sum into a maximum without increasing the 
order of the multiplicative constant in front of C p . The second term in (3.13) 
requires two steps. First, we apply (2.6) for p/2 and a = 4, conditionally on 

(3.14) p^E^M 

i,3 

< 2 2 ^(1 + {p/2f) El £ [(f) 2P E 2 max 1^ \* + (]T E 2 h\^ 



P/2' 



We apply (2.6) with respect to Ei, for p/2 and a = 0, to the second term at the 
right hand side of (3.14) and we obtain the bound 



2 2p+3 (l + (p/2) 4 ) 



E 1 max(y2E 2 hl j ) + C p 



P/2 



which is in terms of some of the quantities appearing at the right hand side of (3.10) 
and with coefficients of lower order. As for the first term at the right of (3.14), we 
apply (2.6) with respect to E\, again for p/2 and a = 4, and get it bounded by 



2 4p+2 (l + (p/2) 4 ) 2 



(f ) 2p E max \hij \ p + E 2 (£ E l max h^) 



P/2' 
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Here the first term coincides with the last one in (3.10), and the second is dominated 

by 



™[E(E^l) 



p/2 



Applying inequality (R\) with respect to E 2 this is in turn dominated by 

-P/ 2 o \P/ 2 



and the first summand has alredy been handled above (first term at the right of 
(3.13)). Collecting terms we obtain inequality (3.10). □ 

Theorem 3.2 gives the following moment inequality and exponential bound for 
bounded kernels. 

Theorem 3.3. There exist universal constants K < oo and L < oo such that, 
if hij are bounded canonical kernels of two variables for the independent random 
variables Xj 2 \ i.j — l,...,n, and if A, B, C, D are as defined in (3.4) and 
(3.9), then 

(3.15) E\ fcj(*f\*j 2) )| P <tf P 

l<i.j<n 

for all p>2 and, equivalently, 



pP /2 C P + pP D P + p 3 P /2 B P + p 2p A p 



Pr{ E M*i (1 Uj 2) ) > 4 

i,j<n ' 



(3.16) 

for all x > 0. 



< Lexp 



lmin( 



x 2 x x 2 / 3 x 1 / 2 



C* 2 ' £>' B 2 / 3 ' A 1 / 2 , 



The moment inequality is immediate from Theorem 3.2 and the equivalence with 
the exponential inequality follows just like (3.2) follows from (3.1) in one direction, 
and, in the other, by integration of tail probabilities. 

Next we comment on the exponential inequality For comparison purposes, let 
hij^X^, X^) — gig'jXij with gi,g'j independent standard normal. In this case, 

C 2 = ^ x 2 j and D = sup UiVjXij : ^ u\ < 1, ^ v? < 1 j 

and the Gaussian chaos inequality in Latala (1999) yields the existence of universal 
constants < k < K < oo such that 



Pr {|E^ - K(CxV 2 + Dx)} < e 



1,3 



and 



Pr {|S^' > k (Cx 1/2 + Dx)}>kAe- x . 



1,3 



By the central limit theorem for canonical [/-statistics, this implies that the coeffi- 
cients of x 2 and x in (3.16) are correct (except for K). It is natural to have terms 
in smaller powers of x in (3.16) e.g., by comparison with Bernstein's inequality 
for sums of independent random variables. In fact, the term in x 1 / 2 cannot be 
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avoided, at least up to logarithmic factors. To see this, consider the product V of 
two independent centered Poisson variables with parameter 1 , which is the limit in 
law of V n = V. X$ n) Y$ n) where X (n) and Y\ (n) are centered Bernoulli random 
variables with parameter p = l/n; then, for large x, the tail probabilities of V are 
of the order of exp (— x 1 ! 2 logx), and therefore, so are those of V n for large n. Also, 
note that the term in x 2 / 3 in the exponent corresponds, up to logarithmic factors, 
to the tail probabilities of the product of two independent random variables, one 
normal and the other centered Poisson. 

are i.i.d., hij = h for all i,j and h is completely degenerate, 
then the parameters defined by (3.4) and (3.8) become: 



A=\\h\U B 2 =n(\\E Y h 2 (x,Y)\\ O0 + \\E x h 2 (X,y)\\ 00 ), C 2 = n 2 Eh 2 



and 



D = nsup{Eh(X,Y)f(X)g(Y): Ef 2 (X)<l,Eg 2 (Y)<l} 
■ = n\\h\\ L2 „ L2 , 

where ||^||l 2 i-^l 2 i s tri c norm of the operator of L2{C(X)) with kernel h. Then, 
inequalities (3.15) and (3.16) become: 

Corollary 3.4. Under the above assumptions, there exist universal constants K < 
oo, L < oo such that, for all n € N and p > 2, 



e\ £ M^Uf) 

i,j<n 



< K P 



pP/ 2 n p {Eh 2 ) p / 2 +p p n p \ 



IIP 

M Z/ 2 i — >Li 



(3.17) 



and 



(3.18) 



-p^n^iWEyh 2 ^ + WExh^f'+p^WhWl 



Pr{ | £>(*, (1 Uj a) ) 

i,j<n 



> x \ < K exp 

^2/3 



1 / X 2 

K m[n {^EV' 

-1/2 



n\\h\\ L2 „ L2 ' [niWEyh^ + WExh^f 3 ' \\h\\H 2 



Inequality (3.18) provides an analogue of Bernstein's inequality for degenerate 
[/-statistics of order 2: note that inequalities (3.15), (3.16), (3.17) and (3.18) can 
all be 'undecoupled' using the result of de la Peha and Montgomery-Smith's (1995). 
It should also be noted that this exponential inequality for canonical [/-statistics 
is strong enough to imply the sufficiency part of the law of the iterated logarithm 
for these objects: this can be seen by applying it to the kernels h n in Steps 7 and 8 
of the proof of Theorem 3.1 in Gine, Kwapieh, Latala and Zinn (1999) (and using 
some of the computations there for the parameters C to D). Neither inequality 
(3.5) nor any of the previously published inequalities for [/-statistics can do this. 
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